Goto

Collaborating Authors

 domain-specific machine


A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

arXiv.org Artificial Intelligence

Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets. We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33%/40% for fine-tuning of retrieval/reader models on the BioASQ dataset when using back translation to enhance the original dataset quality.


Thwart Insider Threats with Machine Learning [Infographic] – Blog Imperva

#artificialintelligence

Potentially the most lethal kind of threat to an organization's security, insider threats can pose risks as significant as--if not more than--external attacks. Because insiders are granted trusted access to sensitive data, these threats often fly under the security radar. By examining how users access your data and identifying when inappropriate or abusive behavior takes place, machine learning can help you secure your data from insider threats. In our insider threat infographic, we examine machine learning and how it works to detect and prevent against insider threats. Summarized below, we review the primary types of insider threat profiles and explain how domain-specific machine learning helps identify insider threats and protect sensitive data.